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Recent investigations have demonstrated the 
effectiveness of a contextual classifier that combines 
spatial and spectral information employing a general 
statistical approach 1,1 This statistical classification 
algorithm exploit? the tendency of certain ground- 
cover classes to occur more frequently in some spatial 
contexts than in others. Indeed, a key input to this 
algorithm is a statistical characterization of the con- 
text: the context distribution. Here we discuss an 
unbiased estimator of the context distribution which, 
besides having the advantage of statistical unbiased- 
yj ness, has the additional advantage over other estima- 
tion techniques of being amenable to an adaptive 
— 1 "O implementation in which the context distribution esti- 
^ ^ mate varies according to local contextual information. 
3 3 Results from applying the unbiased estimator to the 
contextual classification of three real Landsat data 
sets are presented and contrasted with results from 
J non-contextual classifications and from contextual 
N. classifications utilizing other context distibut.on esti- 
m mation techniques. 


pasture or forage crops) where a point-by-point 
classifier utilizing spectral information alone would 
nave much difficulty in doing so. 

The ECHO (Extraction and Classification of Homo- 
geneous Objects) process is a variety of contextual 
classifier which has been found useful for classifying 
data sets which contain homogeneous objects that are 
large compared to the resolution of the imagery . 4 This 
classifier cannot be used effectively, however, if the 
data set does not contain a significant number of these 
large homogeneous objects. 

In several recent papers . I-2 - 5 - 8 we have described a 
general statistical classification method for exploiting 
both spatial and spectral information when classifying 
multispectral im^gc data. This contextual classifier 
exploits the tendency alluded to earlier of certain 
ground-cover classes to occur morn frequently in some 
contexts than in others. Unlike the ECHO process, this 
classifier can be used to advantage on any data set, 
even those data sets that do not have identifiable 
homogenous objects, such as is generally the case in 
forested, urban and other inhomogeneous areas. 
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1. INTRODUCTION 

The machine classification of multispectral image 
data collected by remote sensing devices aboard air- 
craft and spacecraft has usually been performed such 
that each pixel (picture element) is classified individu- 
ally and independently . 3 The information used by this 
c'assifier is only spectral or. in some cases, spectral 
and temporal. There is no provision for using the spa- 
tial information inherent in the data. In contrast, 
when scanner data are displayed in image form, a 
human analyst routinely uses spatial information to 
establish a context for deciding what a particular pixel 
in the imagery might be. Using this context together 
with spectral information, the analyst may easily iden- 
tify roads, delineate boundaries of agricultural fields, 
and differentiate between grass in an urban setting 
(e.g.. lawns) and grass in an agricultural setting (e.g., 


We shall briefly review the statistical basis of the 
contextual decision rule and earlier methods for 
estimating a statistical characterization of context: 
vhe context distribution. We will then describe an 
unbiased estimator of the context distribution. 
Besides having the advantage of statistical unbiased- 
ness this estimator has the additional advantage over 
other estimation techniques of being amenable to an 
adaptive implementation in which the context distribu- 
tion estimate varies according to local contextual 
information Results from apply ng the unbiased esti- 
mator to tue contextual classification of three real 
Landsat data sets are then presented and contrasted 
with results from non-contextual classifications and 
from contextual classifications utilizing other context 
distribution estimation techniques. 


II. THEORETICAL BASIS OF THE CLASSIFIER 
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Consiste it with the general characteristics of 
aging systems for remote sensing, we cssume a two- 
ensional array of S =N ^ random observations 
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(pixels) Xq having fixed but unknown classification , 
as shown in Figure 1. The observation X v consists or n 
measurements (usually containing spectral and/or 
temporal information), while :he classification can 
be any one of m spectral or information classes* from 
the set 0 = 


»- l.j+ 1 | 

i.j ‘.j + 2 


tf.l 


tfiv, | 

1>2I 
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Figure 1. A two-dimensional array of 
pixels. 
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a p=5 choice 


Let X denote a vector whose components are the 
rando. observations: 


Figure 2. Examples of p-context arrays. 


X = [A 0 t = 1.2 Ar,y = 1.2... .,N s ] t . 

Similarly, let * be the vector of states (true 
classifications) associated with the observations: 

*=[tf„|i=1.2 N ,y = 1.2.. i' e ] r . 

The following notation will be useful. Let tV'eD* 
and X p €(R n ) p stand for p-vectors of classes and n- 
dimensional measurements, respectively; sach com- 
ponent of & is a variable which can take r»n tny 
classification value: each component of A* is a n- 

dimensional random vector which can take on values in 
the observation space. 

Let the action (classification) taken with respect 
to pixel (i.j) be denoted by Ot,€f). We restrict the 
action a,, to be a function of a specified subset of 
observations in X This subset includes, along with A'.,. 
p-1 observations spatially near to. but not necessarily 
adjacent to. X v . These p-1 observations serve as the 
spatial context for A\, and are taken from the same 
spatial positions relative to pixel position (i.j) for all i 
and j. Call this arrangement of pixels together with A\, 
the p-context array, several examples of which are 
shown in Figure 2. Group the p observations in the p- 
context array into a vector of observations 
Ay = (A|.A ? . A p ) r and let be the vector of true 
but unknown classifications associated with the obser- 
vations in A„. Note that the and Ay are the partic- 
ular instance of ** and X v associated with pixel posi- 
tion (i.j). Correspondence of the components of . 
Xi , . jp and X p to the positions in the p-context array 
is f..ed but arbitrary except that the p ,A components 
will always correspond tc the pixel to be classified. 

Let the loss suffered by taking action a., be 
denoted by A(*., .a., ) for some fixed non-negative func- 
tion A( . ). The expected average loss (or risk) suffered 


over the N classifications in the classification array is 


R« = £ 


— £ 

N b 


Mi V «*(*«)) 


(i) 


where the expectation is with respect io the distribu- 
tion of X . 


Now consider finding a decision rule of the form 

<H,(Ay) = d(Ay) (2) 

for a fixed function d() mapping p-vectors of 
observations to actions so that /?$ is minimized. If we 
require that the distributions of the Ay are spatially 
invariant, i.e. the value of the probability density for 
Ay depends only on the measurement values ir. Ay and 
the set of classifications in and not the location 
(i.j), the risk, Rj. can be written as 

R f = £ C(iP)/A(*, .<*(** ))/(** \&)dX* 

= / £ C{&)\{4,.d{&))f{X p \?)dX> (3) 

where C(ip). the context distribution, is the relative 
frequency with which ** occurs in the array v, and *p 
is the p tA element of For any array jS, a decision 
rule d(A p ) minimizing Rf can be obtained by minimiz- 
ing the integrand in (3) for each X p ; thus for a specific 
A v (an i’' J tance of A*), an optimal action is: 

d(Ay) = the action (classification) a which minimizes 

£ G(j^)A(iJp.a)/ (Ay | * p ). (4) 

jfiiy 


In practice, a "0-1 loss function" is usually 
assumed, i.e.. 


A(i5,a ) 


0. if * = a 

1, if iJ a 


• Spectral classes are spectrally differentiable sub- 
classes of information classes (the classes of interest). 
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Than (4) simplifies and the decision rule becomes: 

dQCff) - the action a which maximises 

£ C(?)f <&]*>). (5) 

je«». 

V 

Ve now assume class-conditional independence for 
the observations. This assumption means that the 
joint class-conditional density over the p-context array 
can be written as 

/&,!*>)= ( 8 ) 

»«t 

where X k and H k are the elements of X v and ff* , 
respectively. Evidence that this is a reasonable 
assumption may be found in Yamamoto. 11 With this 
assumption, the decision rule in (5) becomes: 

d(Xy) = the action a which maximizes 

£ C&)f[f(X k \* k ). (7) 

f ter. **< 


A more detailed derivation of this decision rule can be 
found in Swain, et al. 1 

The optimal choice of d(-) cannot be implemented 
in practice since it depends on C (d* ) and the 
/ (X k | d* ) which are unknown. Methods for estimating 
the / (X k j d* ) are well established from considerable 
experience in using the conventional non-contextual 
maximum likelihood decision rule. 3 When the 
classification set (1 consists of spectral classes, the 
/ {X k | dn ) are assumed to be multivariate normal den- 
sities. In the case where the classification set (1 con- 
sists of information classes, the f(X k |d*) are assumed 
to be weightec sums of multivariate normal densities. 
We will next discuss methods for estimating the con- 
text distribution, C (d*). 


III. CONTEXT DISTRIBUTION ESTIMATION: 
EARLIER TECHNIQUES 

Simulated data sets were utilized in the earliest 
experiments exploring the effectiveness of classifying 
multispectral remote sensing data using context 
classification as defined by the set of discriminant 
functions in (7). This was done to demonstrate the 
effectiveness of the classifier given that the underlying 
assumptions in the classification model are satisfied. 
At first, the context distribution was found by simple 
tabulation from the true classification Used as a tem- 
plate for the data simulation. As reported in Swain, d 
al. 1 the classifier was very effective when the context 
distribution was determined in this way. 

When dealing with real data, there is no direct way 
of determining the context distribution. We cannot 
tabulate the context distribution from the true 
classification since the true classification is not known. 
However, wc do expect that, at least for large 


N = N the decision rule in (5) where C(d* ) i* 
replaced by an estimate B(d*) based on the data. X, 
will have risk Pg approximating that of the optimal 
rule. Thus we should be able to base an adequate esti- 
mate of the context distribution on the data or, more 
practically, on representative sections from the data 
designated as a training set. The most straightforward 
way to develop an estimate of the context distribution 
from the training set would be to perform a conven- 
tional non-contextual classification of the training set 
and use the context distribution as tabulated from this 
classification as an estimate of the context distribu- 
tion. One could then further refine this estimate of the 
context distribution by making another estimate from 
the contextual classification, and even iterate in this 
way until no further improvement in classification 
accuracy was obtained. 

This iterative "classify-and-count" method was 
tested on one simulated data set and two real data 
sets. As reported in Swain, et al . 1 this method gave 
excellent results on the simulated data set, but disap- 
pointing results on the 'real data sets, stimulating a 
search for alternative methods for estimating the con- - 
text distribution. One such method is the ground- 
truth-guided method. In this method, roughly equal 
subsets of the ground truth data are designated as a 
training set for estimating the context distribution and 
a test set for evaluating the classification results. The 
ground truth data are, of course, represented in terms 
of information classes. When the estimation is to be 
done in terms of spectral classes rather than informa- 
tion classes, the following method is used: 

(1) Perform a conventional non-contextual 
classification of the training set using uniform prior 
probabilities, but allow the the classifier to choose only 
among spectral classes associated with the informa- 
tion class designated by the ground truth. 

(2) Estimate the context distribution by tabulation 
from the resulting 100-percent accurate classification 
of the training set. 

(3) Classify the entire scene with the contextual 
classifier and evaluate the results over a test set dis- 
joint from the training set. 

When the estimation is to be done in terms of informa- 
tion classes, the restricted spectral class classification 
in step (1) above must still be performed. In this case, 
however, this classification is used to provide (by tabu- 
lation) an estimate of the weights used in the weighted 
sum of class-conditional normal densities that make 
up the set of densities / (X k | ) in (7). Each weight is 

the relative frequency of occurrence in the training set 
of a particular spectral class for a given information 
class. The entire scene is then classified in terms of 
information classes using the contextual classifier, and 
evaluated over a test set disjoint from the training set, 
as in the spectral class case. 

Both the spectral and information class formula- 
tions of the ground-truth-guided method were tested 
on two 50-pixel-square Landsat data sets. One data set 
was a LACIE data set from Hodgeman County, Kansas, 
containing pasture, wheat, com and fallow fields. The 
other data set was from Tippecanoe County, Indiana, 
containing residential and commercial areas in north- 
ern Lafayette and West Lafayette, Indiana, as well as 
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•real of forest, agriculture and water (the Wabash 
River). For both data sets, the restricted spectral 
class classification was performed ovr- the first 25 
lines of the data set and the context d itribution was 
estimated over those 25 lines. Contextual 
classifications of the scenes were performed and 
classification accuracies* were evaluated over the last 
25 lines as well as over the entire data set 

Tables 1 and 2 present the results from contextual 
classifications using four-nearest-neighbor (4nn) esti- 
mates of tht context distribution (the p=5 choice in 
Figure 2) for both the spectral and information class 
formulations of the ground-truth-guided method 
(gtgm). These results are also compared to the accu- 
racies obtained from uniform-priors and estimated- 

* Classification accuracy can be tabulated in two ways. 
Overall accuracy is simply the overall number of 
correct classifications divided by the total number at- 
tempted. Average- by- class accuracy is obtained by 
first computing the accuracy for each class and then 
taking the arithmetic average of the class accuracies. 
The latter is significant when the classification results 
exhibit a tendency to discriminate in favor of or 
against a subset of the classes. 


priors non-contextual maximum likelihood 
classifications. The prior probabilities for the 
estimated-priors non-contextual classifications were 
estimated by tabulation from the uniform-priois non- 
contextual classification. These results show that con- 
textual classifications using the ground-truth-guided 
method for estimating the context distribution give 
significantly better results than non-contextual 
classifications on these data sets. For these cases, the 
spectral class formulation of the ground-truth-guided 
method generally produces higher classification accu- 
racies. However, since the spectral class estimate of 
the context distribution has substantially more non- 
zero elements than the information class estimate, 
contextual classifications using the spectral class for- 
mulation generally take over twice the computer time 
required for the information class formulation. 

While this method can produce good estimates of 
the context distribution, it suffers the limitation that it 
requires large areas of spatially contiguous ground 
truth data. When such detailed ground truth data are 
not available, some other method is needed. 


Table 1. Comparison of the contextual classifier using the ground-truth- 
guided method with non-contextual classifiers; Hodgeman County, Kansas, 
Landsat Data Set. 



Table 2. Comparison of the contextual classifier using the ground-truth- 
guided method with non-contextual classifiers; Tippecanoe County, Indiana, 
Landsat Data Set. 
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Has "Power Method" we* the next method investi- 
gated a* a generally applicable method or estimating 
the context distribution. To employ the method, one 
raises the relative frequency count for each class 
configuration to a power and usee the result as the 
context distribution estimate. This method is 
described in detail in Tilton, at at .* The context distri- 
bution estimates generated by the Power Method can 
produce classification accuracies of roughly the same 
high level as produced by the ground-truth-guided 
method. However, the method is very inconvenient to 
use. 

With the the Power Method, an estimate of the 
context distribution is tabulated from a uniform-priors 
non-contextual clasification of the training set. Then 
contextual classifications of the training set and test 
set are performed using a power of the tabulated con- 
text distribution. To achieve the best possible results, 
r. second iteration of this procedure must generally be 
performed, using a context distribution «*umate tabu- 
lated from the training set of toe first iteration 
classification. Unfortunately, no reliable predictor has 
been found for the optimal power to be used for the 
first or second iteration. It is not even the case that 
the most accurate first iteration classification will pro- 
vide in general the best template for the second itera- 
tion. Further, on certain data sets, a spectral-class 
context-distribution estimate produces the best 
results, while on other data sets an information-class 
formulation works better. Despite the good results 
possible with the Power Method, these ambiguities 
make this method difficult to use. and not useful for 
practical applications. A search for a better generally 
applicable method for estimating the context distribu- 
tion has led to the ui. biased estimation technique 
described in the next section. 


IV. CONTEXT DISTRIBUTION ESTIMATION: 
UNBIASED ESTIMATOR 

Or e tactic for seeking an optimal estimate of the 
context distribution, C (d* ), is to look for an estimator 
function, T^QC), which minimizes the mean-squared 
error given oy 

USE = - C(d>)] 8 . (8) 

Equation (B) can be rewritten as 

MSE = Va r[T r (X)] + 6* (9) 

where Var[T^{J()] is the variance of the estimate 
Tp(X) and 6 is the bias given by 

b =E[T r (X)]~ C(&). (10) 

Finding the minimum mean-squared-error estimate is 
generally a difficult task, but since bias represents a 
systematic error, a reasonable approach would be to 
control bias before considering the variance. The best 
one can do in controlling bias is to seek an unbiased 
estimator. 1. e., one for which 6=0. 

As we saw in the previous section, the classify- 
and-count method performed poorly in tests on real 


Landsat data sets. One reason for this is that the esti- 
mate can be statistically biased. To prove this, con- 
sider the classification model as presented in section 
II. In addition to the symbol definitions given there, we 
make the following definitions. Let 9 be the vector of 
classifications 

9= [9 w !<«lA..,.Af,;/xl.a.....JVi] r 

where 9# is the classification estimate from a non- 
contextual classification of the observation Xy . Let 9^ 
be a p-vector of classification estimates associated 
with the observations in the p-context array, Xy . Simi- 
larly, let 9j* be such an estimate associated with an 
arbitrary p-context array, X * . Let rff VP represent 
an arb.trary p-vector of classes. The classify-and- 
count method can be described by the following esti- 
mator function for C (d* 1 ): 

If If 

r-(£)*£(d*) = -i-£'f (ii) 


where 


/&, .?) 


1. if if = 

0, otherwise. 


The expected value of T— (Jf) is then 




* imifmi 1 

- £ £ £ 

= £ Ctf) f f(x*\*>)<ix* (iz) 

t'tir 


Equations (10) and (12) show that the bias of the 
classify-and-count method is the difference between 
C(tp) and a weighted sum of C(if). Note that this 
bias is independent of N, and cannot be reduced by 
increasing sample size. The bias can be non-zero or 
zero, depending of the values of C(rf) and integrals in 
(12). To show this explicitly, let's consider the simple 
special case of a two-class problem (m=2) estimating 
non-contextual relative frequences of classes (p=l) for 
univariate random observations (n=l). Let the r>on- 
contextual classifier used to produce 9 be the 
uniform-priors maximum-likelihood classifier with the 
decision rule: 

d(Xq) = the action a which maximizes / (X v \ a ) 

for all a€(«,,w a J. The densities, f(X v |a), are assumed 
to be normal with mean and variance m = -1 and 
of = 1 for class U| and mean and variance fig = 1 and 
of - 1 for class u E . For class w, we have: 

to(»k) J f(X\ Uk )dX 
s/criu*) 

= tcMff(x\u k )dX 

fc»t — 
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* C{ui)[-^ erf^— + C(m|)[-^ + erf— j— 1 

■ . B4C(U|) + .18C(«|). (13) 

The sum in (13) is equal to C(«,) only if 
C(u x )* C(u t) = . For any other values of C(«i) and 
C(ot) the estimate is biased. Similar comments apply 
for class u, where we have 

«[WY)]-.18C( M ,) + .84C(«,). (14) 

We have shown, then, that the classlfy-and-count 
method does Indeed generally produce biased esti- 
mates of the context distribution. 


The unbiased estimator we have adopted can be 
most easily described by first considering the p=l case 
and then generalizing to the arbitrary p-context array. 
For pal, we examine the equation 


fh k (X) 


T;/(jr| U4 )c( w ,) 




dX 


where m is the number of classes; /pr|u t ), 
1 = 1,2,. ...m, are the class-conditional densities 
described earlier; and the functions A*(JQ. 

= 1,2 m, can be any set of m linearly independent 

functions. Equation ( 15) is valid provided all indicated 
sums and integrals are well defined, which will, for 
example, be the case when all of the functions in (15) 
are bounded. The functions C(«j) and /(X|ui) are 
always bounded because C(u t ) is a relative frequency 
function and f(X |o t ) is a multivariate normal density 
function. The functions ht(X) considered in the follow- 
ing development will also always be bounded. 


Now C can be estimated by solving 

£•/-*£ A £ (1») 

where T = (TiQC),TMC T m {X)) r is the vector 
equivalent of in (oj, (9) and (10). 


To show that T is indeed an unbiased estimator 
for C, we note that 

*(r)«*(r l £)»r‘£(h). 

Looking at E (h ) element by element we have 
E\h k {X) ] M**) 

M.M 




( 20 ) 

(21a) 


Sl/‘^U)/(Ariu 1 )dir]c(« 1 ) 


(15) 


* S? f *k(Xit)f(x< J \ Uij )dx it 

2 E /*»(*«)/(*<, I**) d*<, 

" 1“1 i.j 
MM 

*V"“« 

* 2 C( Ul ) f h k (X)f(X\u t )dX (21b) 

Thus 


The left-hand side of (15), which looks like the EQi)*I C 

expected value of h k (X), can be estimated from the 

data X as follows: ■ — — — - , ■ ■ — 


fh k (X) 


Zf(X\ Ul )C( Ul ) 

K-i 


*x--ui ii>*(*«) ***(*) 


(18) 


where N, N\ and Af* are as defined in Figure 1, and 

k e {1,2 m j. Applying (15) and (18) m times, once 

for each class, we can write 


*t(£) 


hi 

/« • 

• • Jim 

C(«.) 


ht'£) 

= 

hi 

In • 

' ’ J 2m 

C( Ug ) 

(17a) 

*„(*) 


1 

Imt 

J mm 

C(«m) 



where 

/« ±fh k VC)f(X\ U ,)dX. (17b) 

This can be more succinctly represented in vector- 
matrix notation as 

A2/C. (18) 


and (20) becomes 

E(T) = E(h) = r'l C=C (22) 

proving that T is an unbiased estimator for C . 

It is convenient to use a function of the claee- 
conditional densities for the functions A* (AC). More 

n 

•peciflcally, let h^(X) - (Zn)*/(Jf | cj*) and write (17b) 

as 

/« =(2n)* f f(X\ Uk )f(X\ Ul )dX 

where n is the dimensionality of X. Assuming the u« 
are normally distributed spectral classes with respec- 
tive mean vectors and covariance matrices L k 
(fc = 1,2 m), we find 


/« = jde t(Et ♦ E|)| 8 expj--|(/4*-Mi) r (E»+E|)"' (/**“Mi)i • (23) 

■ When the u k are information classes, the lu are 

weighted sums of terms of the form given in (23). 
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When the estimate is made in term* of Information 
olaeaee, eitimatee must be made of the weighta uaed 
to form the weighted aum of the class-conditional nor- 
mal denaitiea of the spectral aubclaaaea. For each 
Information class, the weighta are eat' nated by using 
the unbiased estimator with p= 1 for the spectral 
classes which make up the information class being 
considered. 

The calculation of the estimate of C can proceed 
in one of two alternative ways. The vector h can be 
calculated for the entire image (at in (17a)). Then mul- 
tiplied by to give T a c ; or as the h k (X v ) are calcu- 
lated at each data point^pixel), the product *ith l~ l 
can be performed. The average of this product over 
the entire image is then T # £. The methods are com- 
pletely equivalent; the* difference between them 
amounts to a change in order of summation. However, 
the second method must be used when this unbiased 
estimator is extended to the arbitrary p-context array 
case, because the use of the first method for large 
values of p would require an impractical amount of 
storage. In calculating the estimate of C(d*) at each 
image data point using the second method, individual 
unbiased estimates of the prior probabilities of each 
class are made for each position in the p-context 
array, and cross-products of these prior probabilities 
are taken to form the unbiased estimate of C( i>*) 
based on that image point. To save computer stcrage 
apace, the cross-products having values below a 
specified threshold are ignored. The estimate of Ci^) 
for the entire image is the average of the estimator of 
C(&) based on all the individual image points in the 
scene. 

The unbiased estimator can be modified to pro- 
vide an adaptive estimate of the context distribution. 
The local context distribution estimate for a particular 
riixnt block of image data is made from a mtxmt 
block (miini and mtkng). The niXn* block of image 
data is then classified using this local estimate of the 
context distribution. This process is repeated until the 
entire data set is classified. Better results have gen- 
erally been obtained when m t >ri| and m s >n g . If 
mi=nt and mj=n 8 , the context distribution estimate 
is not accurate for the pixels at the edges of the image 
data block being classified. Tests on three 50-pixel- 
square Landsat data sets have indicated good choices 
for ni and n s ranging from 10 up to 25 with the 
corresponding choices for and mi being B to 10 
larger than the values chosen for ri| and n 8 . 


V. CONTEXTUAL CLASSIFICATION RESULTS 
EMPL0Y1NC THE UNBIASED ESTIMATOR 

Table 3 presents the accuracies resuming from 
contextual classifications for three Landsat data sets 
using four-nearest-neighbor (4nn) estimates of the 
context distribution. The results using the spectral- 
ciass formulation are shown for the whole scene (non- 
adaptive) version and for an adaptive version employ- 
ing local context distribution estimates for 25x25 pixel 
blocks made from the same 25x25 pixel block. The 
results using the information-class formulation are 
shown for an adaptive version employing estimates for 


various n|Xn 8 pixel blocks made from a m|Xm* pixel 
block centered on each n t xn 8 pixel block. The 
uniform-priors non-cohtextual classification results 
ere given for reference. 

Figure 3 shows computer generated gray-scale 
maps of classifications of the Tippecanoe County, Indi- 
ana, Landsat data set. The contextual classification 
looks visually closer to the reference 'mage than might 
be expected based on the accuracy improvement over 
the non-contextual classifications. This is due to the 
tendency of the contextual information to provide a 
smoothing effect, making classification maps that are 
not only more accurate, but also more pleasing to the 
eye. 

The adaptive Information-class formulation per- 
forms as well as or better than any other formulation 
shown. As noted earlier in the discussion of the 
ground-truth-guided method, the information-class 
formulation has the further advantage of having sub- 
stantially fewer non-zero elements in the context dis- 
tribution estimate, causing contextual classifications 
using an information-class formulation to require less 
than half the computer time required for contextual 
classifications using a corresponding spectral class 
formulation. 


VI. CONCLUDING REMARKS 

It had been shown earlier in this reserrch 1 ’** 1 that 
the contextual classifier can provide improved 
classification performance, as compared to non- 
contextual classification, when accurate characteriza- 
tions of the context distribution are available. The 
ground-truth-guided method has been shown to pro- 
vide sufficiently accurate estimates of the context 
distribution, but suffers the disadvantage of requiring 
sizeable amounts of spatially contiguous ground truth. 
The unbiased estimator described herein overcomes 
this disadvantage, providing good estimates of the con- 
text distribution while requiring no more ground truth 
data than is required for a non-contextual 
classification. Furthermore, the unbiased estimator is 
amenable to an adaptive implementation so that the 
resulting context distribution estimate is more closely 
tailored to local conditions in the image data. 
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Table 3. Comparison of the contextual classifier using various unbiased 
estimator formulations and the uniform-priors non-contextual classifier. 


Data Set 

Classification 

^Accuracy 

Average- 
Overall by-Clas* 

1 

uniform-priors non-contextual 

82.0 

75.9 

1 Hodgeman County, 
! Kansas. 50-pixel- 

4nn unbiased, spectral class 
whole image est. (nonadaptive) 

83.1 

75.8 

square Lands at 
(evaluated over 
lines and columns 

4nn unbiased, spectral class 
adaptive est., 25x25 from 25x25 

B4.0 

77.8 

0 through 50) 

4nn unbiased, information class 
adaptive est.. 25x25 from 35x35 

84.0 

78.0 


uniform-priors non-contextual 

83.1 

82.7 

Monroe County, 

4nn unbiased, spectral class 
whole image est. (nonadaptive) 

84.4 

84.4 

Indiana, 50-pixel- 
square Landsat 

4nn unbiased, spectral class 
adaptive est., 25x25 from 25x25 

84.3 

B3.9 

| 

j. 

4nn unbiased, information class 
adaptive est,, 17x17 from 25x25 

88.0 

88.3 


uniform-priors non-contextual 

81.8 

83.4 

Tippecanoe County. 

4nn unbiased, spectral class 
whole image est. (nonadaptive) 

88.2 

87.9 

Indiana. 50-pixel- 
square Landsat 

*'nn unbiased, spectral class 
adaptive est., 25x25 from 25x25 

86.7 

88. 1 


4nn unbiased. Information class 
adaptive est., 25x25 from 25x25 

88.2 

89.1 

j 

4nn unbiased, information class 
adaptive est., 10x10 from 20x20 

86.9 

89.7 
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Figure 3. Visual comparison of classification results. Tippecanoe County. 
Indiana, l.andsat data set. (a) uniform-priors no-context, (b) estimated-priors 
no-context, und (c) four-nearesl-neigbbor adaptive {17*17 from 27*27) unbiased 
estimator (d) reference image. 
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